A/B tests are very commonly performed by data analysts and data scientists. It is important that you get some practice working with the difficulties of these
For this project, you will be working to understand the results of an A/B test run by an e-commerce website. Your goal is to work through this notebook to help the company understand if they should implement the new page, keep the old page, or perhaps run the experiment longer to make their decision.
import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
#We are setting the seed to assure you get the same answers on quizzes as we set up
random.seed(42)
Now, read in the ab_data.csv data. Store it in df.
df = pd.read_csv('ab_data.csv')
df.head(5)
The number of rows in the dataset.
df.shape[0]
The number of unique users in the dataset.
df.user_id.nunique()
The proportion of users converted.
df_control = df.query('group == "control"')
df_treat = df.query('group == "treatment"')
df_control_wrong = df_control.query('landing_page == "new_page"')
df_treat_wrong = df_treat.query('landing_page == "old_page"')
df_control_wrong.shape[0] + df_treat_wrong.shape[0]
The number of times the new_page and treatment don't match.
df_control = df.query('group == "control"')
df_treat = df.query('group == "treatment"')
df_control_wrong = df_control.query('landing_page == "new_page"')
df_treat_wrong = df_treat.query('landing_page == "old_page"')
df_control_wrong.shape[0] + df_treat_wrong.shape[0]
Do any of the rows have missing values?
df.isnull().sum()
For the rows where treatment does not match with new_page or control does not match with old_page, we cannot be sure if this row truly received the new or old page.
Drop rows with the wrong values
df2 = df.drop(df_control_wrong.index)
df2.drop(df_treat_wrong.index, inplace = True)
# Double Check all of the correct rows were removed - this should be 0
df2[((df2['group'] == 'treatment') == (df2['landing_page'] == 'new_page')) == False].shape[0]
How many unique user_ids are in df2?
df2.user_id.nunique()
There is one user_id repeated in df2. What is it?
#Find dups and put into a df in case we need in future
dup_df = df2[df2.user_id.duplicated(keep = False)]
What is the row information for the repeat user_id?
#Show me the dups
dup_df
Remove one of the rows with a duplicate user_id, but keep dataframe as df2.
#Drop duplicates on user id, removing inplace from df
df2.drop_duplicates(subset = "user_id", inplace = True)
#Confirm our user id still exists but only once
df2.query('user_id == 773192')
What is the probability of an individual converting regardless of the page they receive?
conv = df2.converted.mean()
conv
Given that an individual was in the control group, what is the probability they converted?
control_conv = df2.query('group == "control"').converted.mean()
control_conv
Given that an individual was in the treatment group, what is the probability they converted?
treat_conv = df2.query('group == "treatment"').converted.mean()
treat_conv
What is the probability that an individual received the new page?
p_new_page = df2[df2['landing_page'] == 'new_page'].user_id.count()\
/df2.user_id.count()
p_new_page
Observations
Notice that because of the time stamp associated with each event, you could technically run a hypothesis test continuously as each observation was observed.
However, then the hard question is do you stop as soon as one page is considered significantly better than another or does it need to happen consistently for a certain amount of time? How long do you run to render a decision that neither page is better than another?
These questions are the difficult parts associated with A/B tests in general.
For now, consider you need to make the decision just based on all the data provided. If you want to assume that the old page is better unless the new page proves to be definitely better at a Type I error rate of 5%, what should your null and alternative hypotheses be? You can state your hypothesis in terms of words or in terms of $p_{old}$ and $p_{new}$, which are the converted rates for the old and new pages.
Hypothesis
Using mathematical notation, this can be written as follows:
$$H_0: p_{new} - p_{old} \leq 0$$$$H_1: p_{new} - p_{old} > 0$$Assume under the null hypothesis, $p_{new}$ and $p_{old}$ both have "true" success rates equal to the converted success rate regardless of page - that is $p_{new}$ and $p_{old}$ are equal. Furthermore, assume they are equal to the converted rate in ab_data.csv regardless of the page.
Use a sample size for each page equal to the ones in ab_data.csv.
Perform the sampling distribution for the difference in converted between the two pages over 10,000 iterations of calculating an estimate from the null.
What is the conversion rate for $p_{new}$ under the null?
# Get the converted rate regardless of the page
p_new = df2.converted.mean()
p_new
What is the conversion rate for $p_{old}$ under the null?
# Get the converted rate regardless of the page and store it as pnew_h0
p_old = df2.converted.mean()
p_old
What is $n_{new}$, the number of individuals in the treatment group?
# count the number of users who recieved the new_page
n_new = df2[df2['landing_page'] == 'new_page'].shape[0]
n_new
What is $n_{old}$, the number of individuals in the control group?
# count the number of users who recieved the old_page
n_old = df2[df2['landing_page'] != 'new_page'].shape[0]
n_old
Simulate $n_{new}$ transactions with a conversion rate of $p_{new}$ under the null. Store these $n_{new}$ 1's and 0's in new_page_converted.
# Get sample choice between [0,1]
# with sample size equal to number of users in df
# given probability
new_page_converted = np.random.choice([0,1], size = n_new, p=[1-p_new,p_new])
Simulate $n_{old}$ transactions with a conversion rate of $p_{old}$ under the null. Store these $n_{old}$ 1's and 0's in old_page_converted.
# Get sample choice between [0,1]
# with sample size equal to number of users in df
# given probability
old_page_converted = np.random.choice([0,1], size = n_old,replace=True, p=[1-p_old,p_old])
Find $p_{new}$ - $p_{old}$ for your simulated values from part (e) and (f).
sample_diff = new_page_converted.mean() - old_page_converted.mean()
sample_diff
Create 10,000 $p_{new}$ - $p_{old}$ values using the same simulation process you used in parts (a) through (g) above. Store all 10,000 values in a NumPy array called p_diffs.
p_diffs = []
for _ in range (10000):
new_page_converted = np.random.choice([0,1], size = n_new, replace=True, p=[1-p_new,p_new])
old_page_converted = np.random.choice([0,1], size = n_old, replace=True, p=[1-p_old, p_old])
p_diff = new_page_converted.mean()- old_page_converted.mean()
p_diffs.append(p_diff)
p_diffs = np.asarray(p_diffs)
actual_diff = treat_conv - control_conv
Plot a histogram of the p_diffs.
# plot histogram of the p_diffs
plt.subplots(figsize=(10, 10))
plt.hist(p_diffs, color='#22a0ff', bins=20)
plt.xlabel('Difference of Convert Rate')
plt.ylabel('Frequency')
plt.title('Simulated Difference of Convert Rate between New Page and Old Page')
# plot the line of actual difference
plt.axvline(actual_diff , c='red', linestyle='dashed', linewidth=1, label="Actual Difference Observed")
plt.legend()
plt.show()
What proportion of the p_diffs are greater than the actual difference observed in ab_data.csv?
(p_diffs > actual_diff).mean()
Observations
Typically we would expect the P-Value to be lower than the level of $\alpha$ (typically 0.05). The $\alpha$ indicates the percentage chance of committing a Type I error if the null is true.
We have calculated our P-Value to be 0.9 which is large. This tells us that a high proportion (90%) of the converted differences in the Null hypothesis distribution are larger than the actual difference observed. This effectively tells us that there is a high probability conversion rate will be larger under null hypothesis.
Therefore, given the P-Value 0.9, as such, we fail to reject the null and conclude that there is insufficient evidence that there is a difference between the groups.
We could also use a built-in to achieve similar results. Though using the built-in might be easier to code, the above portions are a walkthrough of the ideas that are critical to correctly thinking about statistical significance.
import statsmodels.api as sm
convert_old = df2.query('landing_page == "old_page" & converted == 1').converted.count()
convert_new = df2.query('landing_page == "new_page" & converted == 1').converted.count()
Now use stats.proportions_ztest to compute your test statistic and p-value. Here is a helpful link on using the built in.
z_score, p_value = sm.stats.proportions_ztest \
([convert_new, convert_old], [n_new, n_old], alternative='larger')
print('z-score: ',z_score ,', p-value: ', p_value)
Observations
The calculated values align with those obtained during the bootstrapped hypothesis testing.
In this final part, we will see that the result you achieved in the A/B test in Part II above can also be achieved by performing regression.
Since each row is either a conversion or no conversion,We will be performing logistic regression.
Logistic Regression.
The goal is to use statsmodels to fit the regression model specified to see if there is a significant difference in conversion based on which page a customer receives. However, first we need to create in df2 a column for the intercept, and create a dummy variable column for which page each user received.
df2[['ab_page', 'old_page']] = pd.get_dummies(df2['landing_page'])
df2['intercept'] = 1
Use statsmodels to instantiate the regression model on the two columns created, then fit the model using the two columns created to predict whether or not an individual converts.
log_m = sm.Logit(df2['converted'], df2[['intercept', 'ab_page']])
results = log_m.fit()
Provide the summary of your model below
results.summary()
P-Value.
Observations.
It would be advantageous to consider other factors to add to the regression model. Through the multiple linear regression model, it can help us to determine the relative influence of one or more predictor variables to the response value. Since we can see that the old/new page is not linear to the converted rate, We can investigate other factors at play. For example, the country a user originates from, the time duration a user spent on the website for example.
However, there are still drawbacks to adding additional terms. We should consider the potential problems like multi-collinearity when adding addition variables. Furthermore, non-linearity of the response-predictor relationships may occur when we add multiple factors.
Now along with testing if the conversion rate changes for different pages, also add an effect based on which country a user lives in. Read in the countries.csv dataset and merge together your datasets on the appropriate rows. Here are the docs for joining tables.
countries_df = pd.read_csv('countries.csv')
df_new = countries_df.set_index('user_id').join(df2.set_index('user_id'), how='inner')
df_new.head()
# Find unique values of Country
df_new['country'].unique()
# Add dummy variables
df_new[['CA', 'UK', 'US']] = pd.get_dummies(df_new['country'])
df_new.head()
# Fit our new model
log_mod = sm.Logit(df_new['converted'], df_new[['intercept', 'CA', 'UK']])
result = log_mod.fit()
result.summary()
Observations
Though we have now looked at the individual factors of country and page on conversion, we would now like to look at an interaction between page and country to see if there significant effects on conversion. We will create the necessary additional columns, and fit the new model.
Adding an Interaction Variable between page and country to see if there significant effects on conversion
df_new['CA_page'] = df_new['CA'] * df_new['ab_page']
df_new['UK_page'] = df_new['UK'] * df_new['ab_page']
df_new.head()
log_mod = sm.Logit(df_new['converted'], df_new[['intercept', 'ab_page', 'CA', 'UK', 'CA_page', 'UK_page']])
result = log_mod.fit()
result.summary()
Observations After utilising some interaction variables in the logistic regression model, there continues to be no variable with significant p-values. With that, we fail to reject the null hypothesis. There is insufficient evidence to suggest that there is an interaction between country and page that will predict whether a user converts.
After our data analysis, based on the information available to us, we do not have sufficient evidence to suggest that the new page will result in more conversions than the old page.
from subprocess import call
call(['python', '-m', 'nbconvert', 'Analyze_ab_test_results_notebook.ipynb'])